computational number theory In mathematics and computer science, computational number theory, also known as algorithmic number theory, is the study of computational methods for investigating and solving problems in number theory and arithmetic geometry, including algorithm ...

, the index calculus algorithm is a

probabilistic Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...

for computing

discrete logarithm In mathematics, for given real numbers ''a'' and ''b'', the logarithm log''b'' ''a'' is a number ''x'' such that . Analogously, in any group ''G'', powers ''b'k'' can be defined for all integers ''k'', and the discrete logarithm log''b' ...

s. Dedicated to the discrete logarithm in

(\mathbb/q\mathbb)^*

where

q

is a prime, index calculus leads to a family of algorithms adapted to finite fields and to some families of elliptic curves. The algorithm collects relations among the discrete logarithms of small primes, computes them by a linear algebra procedure and finally expresses the desired discrete logarithm with respect to the discrete logarithms of small primes.

Description

Roughly speaking, the

discrete log In mathematics, for given real numbers ''a'' and ''b'', the logarithm log''b'' ''a'' is a number ''x'' such that . Analogously, in any group ''G'', powers ''b'k'' can be defined for all integers ''k'', and the discrete logarithm log''b' ...

problem asks us to find an ''x'' such that

g^x \equiv h \pmod

, where ''g'', ''h'', and the modulus ''n'' are given. The algorithm (described in detail below) applies to the group

(\mathbb/q\mathbb)^*

where ''q'' is prime. It requires a ''factor base'' as input. This ''factor base'' is usually chosen to be the number −1 and the first ''r'' primes starting with 2. From the point of view of efficiency, we want this factor base to be small, but in order to solve the discrete log for a large group we require the ''factor base'' to be (relatively) large. In practical implementations of the algorithm, those conflicting objectives are compromised one way or another. The algorithm is performed in three stages. The first two stages depend only on the generator ''g'' and prime modulus ''q'', and find the discrete logarithms of a ''factor base'' of ''r'' small primes. The third stage finds the discrete log of the desired number ''h'' in terms of the discrete logs of the factor base. The first stage consists of searching for a set of ''r''

linearly independent In the theory of vector spaces, a set of vectors is said to be if there is a nontrivial linear combination of the vectors that equals the zero vector. If no such linear combination exists, then the vectors are said to be . These concepts are ...

''relations'' between the factor base and power of the

generator Generator may refer to: * Signal generator, electronic devices that generate repeating or non-repeating electronic signals * Electric generator, a device that converts mechanical energy to electrical energy. * Generator (circuit theory), an eleme ...

''g''. Each relation contributes one equation to a

system of linear equations In mathematics, a system of linear equations (or linear system) is a collection of one or more linear equations involving the same variable (math), variables. For example, :\begin 3x+2y-z=1\\ 2x-2y+4z=-2\\ -x+\fracy-z=0 \end is a system of three ...

in ''r'' unknowns, namely the discrete logarithms of the ''r'' primes in the factor base. This stage is

embarrassingly parallel In parallel computing, an embarrassingly parallel workload or problem (also called embarrassingly parallelizable, perfectly parallel, delightfully parallel or pleasingly parallel) is one where little or no effort is needed to separate the problem i ...

and easy to divide among many computers. The second stage solves the system of linear equations to compute the discrete logs of the factor base. A system of hundreds of thousands or millions of equations is a significant computation requiring large amounts of memory, and it is ''not'' embarrassingly parallel, so a

supercomputer A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructions ...

is typically used. This was considered a minor step compared to the others for smaller discrete log computations. However, larger discrete logarithm records were made possible only by shifting the work away from the linear algebra and onto the sieve (i.e., increasing the number of equations while reducing the number of variables). The third stage searches for a power ''s'' of the generator ''g'' which, when multiplied by the argument ''h'', may be factored in terms of the factor base ''g^sh'' = (−1)^''f''₀ 2^''f''₁ 3^''f''₂···''p''_''r''^''f''_''r''. Finally, in an operation too simple to really be called a fourth stage, the results of the second and third stages can be rearranged by simple algebraic manipulation to work out the desired discrete logarithm ''x'' = ''f''₀log_''g''(−1) + ''f''₁log_''g''2 + ''f''₂log_''g''3 + ··· + ''f''_''r''log_''g''''p_r'' − ''s''. The first and third stages are both embarrassingly parallel, and in fact the third stage does not depend on the results of the first two stages, so it may be done in parallel with them. The choice of the factor base size ''r'' is critical, and the details are too intricate to explain here. The larger the factor base, the easier it is to find relations in stage 1, and the easier it is to complete stage 3, but the more relations you need before you can proceed to stage 2, and the more difficult stage 2 is. The relative availability of computers suitable for the different types of computation required for stages 1 and 2 is also important.

Applications in other groups

The lack of the notion of ''prime elements'' in the group of points on

elliptic curves In mathematics, an elliptic curve is a smooth, projective, algebraic curve of genus one, on which there is a specified point . An elliptic curve is defined over a field and describes points in , the Cartesian product of with itself. If the ...

makes it impossible to find an efficient ''factor base'' to run index calculus method as presented here in these groups. Therefore this algorithm is incapable of solving discrete logarithms efficiently in elliptic curve groups. However: For special kinds of curves (so called

supersingular elliptic curve In algebraic geometry, supersingular elliptic curves form a certain class of elliptic curves over a field of characteristic ''p'' > 0 with unusually large endomorphism rings. Elliptic curves over such fields which are not supersingular ar ...

s) there are specialized algorithms for solving the problem faster than with generic methods. While the use of these special curves can easily be avoided, in 2009 it has been proven that for certain fields the discrete logarithm problem in the group of points on ''general'' elliptic curves over these fields can be solved faster than with generic methods. The algorithms are indeed adaptations of the index calculus method.

The algorithm

Input: Discrete logarithm generator

g

, modulus

q

and argument

h

. Factor base

\

, of length

r+1

.
Output:

x

such that

g^x=h \mod q

. * relations ← empty_list * for

k = 1, 2, \ldots

** Using an

integer factorization In number theory, integer factorization is the decomposition of a composite number into a product of smaller integers. If these factors are further restricted to prime numbers, the process is called prime factorization. When the numbers are suf ...

algorithm optimized for

smooth numbers In number theory, an ''n''-smooth (or ''n''-friable) number is an integer whose prime factors are all less than or equal to ''n''. For example, a 7-smooth number is a number whose every prime factor is at most 7, so 49 = 72 and 15750 = 2 × 32 × 5 ...

, try to factor

g^k \bmod  q

(Euclidean residue) using the factor base, i.e. find

e_i

's such that

g^k \bmod q= (-1)^2^3^\cdots p_r^

** Each time a factorization is found: *** Store

k

and the computed

e_i

's as a vector

(e_0,e_1,e_2,\ldots,e_r,k)

(this is a called a relation) *** If this relation is

to the other relations: **** Add it to the list of relations **** If there are at least

r+1

relations, exit loop * Form a matrix whose rows are the relations * Obtain the

reduced echelon form In linear algebra, a matrix is in echelon form if it has the shape resulting from a Gaussian elimination. A matrix being in row echelon form means that Gaussian elimination has operated on the rows, and column echelon form means that Gaussian el ...

of the matrix ** The first element in the last column is the discrete log of

-1

and the second element is the discrete log of

2

and so on * for

s = 1, 2, \ldots

** Try to factor

g^s h \bmod q= (-1)^2^3^\cdots p_r^

over the factor base ** When a factorization is found: *** Output

x = f_0 \log_g(-1) + f_1 \log_g2 + \cdots + f_r \log_g p_r - s.

Complexity

Assuming an optimal selection of the factor base, the expected running time (using

L-notation ''L''-notation is an asymptotic notation analogous to big-O notation, denoted as L_n alpha,c/math> for a bound variable n tending to infinity. Like big-O notation, it is usually used to roughly convey the rate of growth of a function, such as the ...

) of the index-calculus algorithm can be stated as

L_n /2,\sqrt+o(1)

History

The basic idea of the algorithm is due to Western and Miller (1968), which ultimately relies on ideas from Kraitchik (1922). The first practical implementations followed the 1976 introduction of the Diffie-Hellman cryptosystem which relies on the discrete logarithm. Merkle's Stanford University dissertation (1979) was credited by Pohlig (1977) and Hellman and Reyneri (1983), who also made improvements to the implementation. Adleman optimized the algorithm and presented it in the present form.

The Index Calculus family

Index Calculus inspired a large family of algorithms. In finite fields

\mathbb_

with

q=p^n

for some prime

p

, the state-of-art algorithms are the Number Field Sieve for Discrete Logarithms,

L_\left /3,\sqrt[3,\right.html" ;"title=".html" ;"title="/3,\sqrt[3">/3,\sqrt[3,\right">.html" ;"title="/3,\sqrt[3">/3,\sqrt[3,\right/math>, when p is large compared to q, the function field sieve, L_q\left /3,\sqrt[3,\right.html" ;"title=".html" ;"title="/3,\sqrt[3">/3,\sqrt[3,\right">.html" ;"title="/3,\sqrt[3">/3,\sqrt[3,\right/math>, and Joux, A. Joux, ''A new index calculus algorithm with complexity'' L(1/4+o(1))''in very small characteristic'

/ref>

L_\left[1/4+\varepsilon,c\right]

for

c>0

, when

p

is small compared to

q

and the Number Field Sieve in High Degree,

L_q /3,c /math> for c>0 when p is middle-sided. Discrete logarithm in some families of elliptic curves can be solved in time L_q\left /3,c\right /math> for c>0, but the general case remains exponential.

External links

Discrete logarithms in finite fields and their cryptographic significance
by

Andrew Odlyzko Andrew Michael Odlyzko (Andrzej Odłyżko) (born 23 July 1949) is a Polish-American mathematician and a former head of the University of Minnesota's Digital Technology Center and of the Minnesota Supercomputing Institute. He began his career in ...

Discrete Logarithm Problem
by Chris Studholme, including the June 21, 2002 paper "The Discrete Log Problem". *

Notes

{{DEFAULTSORT:Index Calculus Algorithm Group theory